Gradient Descent with Proximal Average for Nonconvex and Composite Regularization

نویسندگان

  • Leon Wenliang Zhong
  • James T. Kwok
چکیده

Sparse modeling has been highly successful in many realworld applications. While a lot of interests have been on convex regularization, recent studies show that nonconvex regularizers can outperform their convex counterparts in many situations. However, the resulting nonconvex optimization problems are often challenging, especially for composite regularizers such as the nonconvex overlapping group lasso. In this paper, by using a recent mathematical tool known as the proximal average, we propose a novel proximal gradient descent method for optimization with a wide class of nonconvex and composite regularizers. Instead of directly solving the proximal step associated with a composite regularizer, we average the solutions from the proximal problems of the constituent regularizers. This simple strategy has guaranteed convergence and low per-iteration complexity. Experimental results on a number of synthetic and real-world data sets demonstrate the effectiveness and efficiency of the proposed optimization algorithm, and also the improved classification performance resulting from the nonconvex regularizers. Introduction Risk minimization is a fundamental tool in machine learning. It admits a tradeoff between the empirical loss and regularization as: min x∈Rd f(x) ≡ `(x) + r(x), (1) where ` is the loss, and r is a regularizer on parameter x. In particular, sparse modeling, which uses a sparsity-inducing regularizer for feature selection, has achieved great success in many real-world applications. A well-known sparsityinducing regularizer is the `1-regularizer. As a surrogate of the `0-norm, it induces a sparse solution simultaneously with learning (Tibshirani 1996). When the features have some intrinsic structures, more sophisticated structured-sparsityinducing regularizers (such as the group lasso regularizer (Yuan and Lin 2006)) can be used. More examples can be found in (Bach et al. 2011; Combettes and Pesquet 2011) and reference therein. Existing sparsity-inducing regularizers are often convex. Together with a convex loss, this leads to a convex optimization problem with globally optimal solution. Copyright c © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Despite such extensive popularity, convexity does not necessarily imply good prediction performance or feature selection. Indeed, it has been shown that lasso may lead to over-penalization and suboptimal feature selection (Zhang 2010b; Candes, Wakin, and Boyd 2008). To overcome this problem, several nonconvex variants have been recently proposed, such as the capped-`1 (Zhang 2010b), log-sum penalty (LSP) (Candes, Wakin, and Boyd 2008), smoothly clipped absolute deviation (SCAD) (Fan and Li 2001) and minmax concave penalty (MCP) (Zhang 2010a). For more sophisticated scenarios, recent research efforts demonstrate that nonconvex regularizers, such as the nonconvex group lasso (Xiang, Shen, and Ye 2013; Chartrand and Wohlberg 2013), matrix MCP norm (Wang, Liu, and Zhang 2013), and grouping pursuit (Shen and Huang 2010), can outperform their convex counterparts. However, these nonconvex models often yield challenging optimization problems. As most of them can be rewritten as f1 − f2, a difference of two convex functions f1 and f2 (Gong et al. 2013), a popular optimization solver is the multi-stage convex programming, which recursively approximates f2 while leaving f1 intact (Zhang 2010b; Zhang et al. 2013; Xiang, Shen, and Ye 2013). However, it involves nonlinear optimization in each iteration and thus expensive in general. The sequential convex program (SCP) (Lu 2012) further approximates the smooth part of f1 so that the update can be more efficient for simple regularizers like the capped-`1. However, it is often trapped in poor local optimum (Gong et al. 2013). Recently, a general iterative shrinkage and thresholding (GIST) framework is proposed (Gong et al. 2013), which shows promising performance in a class of nonconvex penalties. However, for composite regularizers such as the nonconvex variants of overlapping group lasso (Zhao, Rocha, and Yu 2009), generalized lasso (Tibshirani, Hoefling, and Tibshirani 2011) and a combination of `1and trace norms (Richard, Savalle, and Vayatis 2012), both SCP and GIST are inefficient as the underlying proximal steps for these composite regularizers are very difficult. In this paper, we propose a simple algorithm called Gradient Descent with Proximal Average of Nonconvex functions (GD-PAN) and its line-search-based variant GD-PANLS, which are suitable for a wide class of nonconvex and composite regularization problems. We first extend a recent optimization tool called “proximal average” (Yu 2013; Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proximal Gradient Algorithm for Decentralized Composite Optimization over Directed Networks

This paper proposes a fast decentralized algorithm for solving a consensus optimization problem defined in a directed networked multi-agent system, where the local objective functions have the smooth+nonsmooth composite form, and are possibly nonconvex. Examples of such problems include decentralized compressed sensing and constrained quadratic programming problems, as well as many decentralize...

متن کامل

A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

In this paper, we show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set while (2) simultaneously fitting the trimmed model on the uncontaminated data that remains. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximal-gradient algorith...

متن کامل

Nonconvex proximal splitting with computational errors∗

Throughout this chapter, ‖·‖ denotes the standard Euclidean norm. Problem (1) generalizes the more thoroughly studied class of composite convex optimization problems [30], a class that has witnessed huge interest in machine learning, signal processing, statistics, and other related areas. We refer the interested reader to [2, 3, 21, 37] for several convex examples and recent references. A threa...

متن کامل

Geometric Descent Method for Convex Composite Minimization

In this paper, we extend the geometric descent method recently proposed by Bubeck, Lee and Singh [5] to solving nonsmooth and strongly convex composite problems. We prove that the resulting algorithm, GeoPG, converges with a linear rate (1− 1/√κ), thus achieves the optimal rate among first-order methods, where κ is the condition number of the problem. Numerical results on linear regression and ...

متن کامل

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, stateof-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014